The CIPS-SIGHAN CLP 2014 Chinese Word Segmentation Bake-off
نویسندگان
چکیده
This paper summarizes the SIGHAN 2014 Chinese Word Segmentation bakeoff in several aspects such as dataset, evaluation results. In addition, we analyze errors of segmentation by instance and make a suggestion for improving segmentation systems. 1 Goal of the Chinese word segmentation bake-off Chinese Word Segmentation is the preliminary step for Chinese information processing, which is extremely important and never neglected. Due to the properties of Chinese, the performance of Chinese word segmentation has an effect on the following analysis of Chinese text. As the organizer of the bake-off in Chinese word segmentation, not only do we show the performance of all participated systems, but also try to find out the weak point of these systems. In this way, participants are able to learn advantages of their systems and realize the problems which they did not pay attention to so that they could improve their system according to our feedbacks, which turns out to promote the study of Chinese word segmentation.
منابع مشابه
The CIPS-SIGHAN CLP 2012 ChineseWord Segmentation onMicroBlog Corpora Bakeoff
The CIPS-SIGHAN CLP 2012 Chinese Word Segmentation on MicroBlog Corpora Bakeoff was held in the autumn of 2012. This bake-off task of Chinese word segmentation is focused on the performance of Chinese word segmentation algorithms on MicroBlog corpora. 17 groups submitted 20 results, among which the best system has all the P, R and F values near 95%, and the average values of the 17 systems are ...
متن کاملIntroduction to BIT Chinese Spelling Correction System at CLP 2014 Bake-off
This paper describes the Chinese spelling correction system submitted by BIT at CLP Bake-off 2014 task 2. The system mainly includes two parts: 1) N-gram model is adopted to retrieve the non-words which are wrongly separated by word segmentation. The non-words are then corrected in terms of word frequency, pronunciation similarity, shape similarity and POS (part of speech) tag. 2) For wrong wor...
متن کاملNCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation
This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2014 evaluation. The system’s main components are still the conditional random field (CRF)-based word segmentation/part-ofspeech (POS) tagger and tri-gram language model (LM) used last year. But we tried to refine the misspelling rules, decision-making threshold and improve LM rescoring speed to reduce false ala...
متن کاملAdaptive Chinese Word Segmentation with Online Passive-Aggressive Algorithm
In this paper, we describe our system1 for CIPS-SIGHAN-2010 bake-off task of Chinese word segmentation, which focused on the cross-domain performance of Chinese word segmentation algorithms. We use the online passive-aggressive algorithm with domain invariant information for cross-domain Chinese word segmentation.
متن کاملWord Segmentation on Chinese Mirco-Blog Data with a Linear-Time Incremental Model
This paper describes the model we designed for the word segmentation bakeoff on Chinese micro-blog data in the 2nd CIPS-SIGHAN joint conference on Chinese language processing. We presented a linear-time incremental model for word segmentation where rich features including character-based features, word-based features as well as other possible features can be easily employed. We report the perfo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014